Dimension Reduction Methods of Text Documents by Neural Networks
نویسندگان
چکیده
The paper is oriented to introduce different dimension reduction methods in the text document retrieval area. First, the mostly used text document retrieval models are described, and then in second part the analytical approach and neural network approaches to dimension reduction of keyword space are described. Dimension reduction methods reduce keyword space into much smaller size together with retaining similarity on the highest possible level. The result of dimension reduction of text documents is saving memory space used for document representation.
منابع مشابه
NTC (Neural Text Categorizer): Neural Network for Text Categorization
This research proposes a new neural network for text categorization which uses alternative representations of documents to numerical vectors. Since the proposed neural network is intended originally only for text categorization, it is called NTC (Neural Text Categorizer) in this research. Numerical vectors representing documents for tasks of text mining have inherently two main problems: huge d...
متن کاملText Document Retrieval by Document Space Dimension Reduction with Feed-Forward Neural Networks
The paper deals with text document retrieval from the given document collection by using neural networks, namely cascade neural network model, linear and nonlinear Hebbian neural networks and linear autoassociative neural network. With using neural networks it is possible to reduce the dimension of the search space with preserving the highest retrieval accuracy.
متن کاملEffective Dimension Reduction Techniques for Text Documents
Frequent term based text clustering is a text clustering technique, which uses frequent term set and dramatically decreases the dimensionality of the document vector space, thus especially addressing: very high dimensionality of the data and very large size of the databases. Frequent Term based Clustering algorithm (FTC) has shown significant efficiency comparing to some well known text cluster...
متن کاملNeural dimensionality reduction for document processing
Document processing usually gives rise to high-dimension representation vectors which are redundant and costly to process. Reducing dimensionality would be appropriate, but standard factor analysis methods such as PCA cannot deal with vectors of very high dimension. We have used instead an adaptive neural network technique (the Generalized Hebbian Algorithm) to extract the first principal compo...
متن کاملConvolutional Neural Networks for Direct Text Deblurring
In this work we address the problem of blind deconvolution and denoising. We focus on restoration of text documents and we show that this type of highly structured data can be successfully restored by a convolutional neural network. The networks are trained to reconstruct high-quality images directly from blurry inputs without assuming any specific blur and noise models. We demonstrate the perf...
متن کامل